Figeno: multi-region genomic figures with long-read support

Abstract Summary The vast amount of publicly available genomic data requires analysis and visualization tools. Here, we present figeno, an application for generating publication-quality FIgures for GENOmics. Figeno particularly focuses on multi-region views across genomic breakpoints and on long reads with base modifications. In addition, we support epigenomic data including ATAC-seq, ChIP-seq or HiC, as well as whole genome sequencing data with copy numbers and structural variants. Availability and implementation Figeno is available as a python package with both a command line and graphical user interface. It can be installed via PyPI and the source code is available at https://github.com/CompEpigen/figeno.


Introduction
Recent advances in sequencing have led to the generation of many different types of genomic and epigenomic data, which require visualization tools for facilitating analysis and interpretation.An important requirement for such tools is the support of various data types, for large datasets, and for generating publication-quality images.The integrative genomics viewer (IGV) (Robinson et al. 2011) enables interactive exploration of genomics data, but lacks support for some data types like chromatin conformation (e.g.HiC).Furthermore, IGV has limited export capabilities, making it subpar for generating publication-quality figures.Other tools are geared toward a specific data type, e.g.trackplot (Mayakonda and Westermann 2024) for bigwig files, while others like Gviz (Hahne and Ivanek 2016) or PyGenomeTracks (Lopez-Delisle et al. 2021) can handle more data types.An important limitation of most genomic visualization software is that they only display one genomic region at a time, which is admittedly sufficient for many use cases but e.g.precludes the visualization of novel chromatin interactions across breakpoints in HiC data.NeoLoopFinder (Wang et al. 2021) has a visualization component which enables the visualization of HiC data across breakpoints, but it is not primarily a visualization tool.Third generation sequencing technologies (Oxford Nanopore Technologies, Pacific Biosciences) provide long reads which can be phased to each parental haplotype, as well as base modification information.Most visualization software do not support long reads, while new tools such as methplotlib (De Coster et al. 2020) and methylartist (Cheetham et al. 2022) have been specifically designed to display base modifications from long reads, but lack other common features such as support for bigwig and HiC data.Here, we introduce figeno, an application for visualizing genomics and epigenomics data, including long reads, which allows multiple regions to be visualized simultaneously.
2 Figeno: a flexible, versatile, and user-friendly solution for genomic plotting

Figeno overview
Figeno is a visualization tool for genomic data tracks (Fig. 1A).Figeno's key features are (i) the support of many data types; (ii) multi-region views; (iii) the support for base modifications from long reads; (iv) export into various image formats (see Supplementary Table S1 for a comparison to other visualization software).Of the ten types of tracks that can be visualized with figeno (Fig. 1B), the first three tracks serve general purposes: "chr_axis" specifies the genomic coordinates, "genes" shows annotated genes in the region, and "bed" allows for additional custom genomic annotations.Gene files are provided for the reference genomes hg19, hg38, and mm10, but users can also provide custom annotation files for further reference genomes (in NCBI RefSeq or GTF format)."bigwig" tracks allow for the visualization of epigenetic data types including ChIP-seq or ATACseq, and "hic" tracks can be used to visualize HiC data (.cool format).For alignment files in bam format, figeno visualizes either the alignments themselves, the coverage, or base modification frequency (see section long-read support).Finally, figeno supports the visualization of whole genome sequencing data with tracks for copy number and structural variants.The figures can be exported as bitmap (png) or vector (svg, pdf) graphics formats for direct use in publication or further processing with a vector graphics editor.A configuration file in json format serves as input to figeno.In addition to custom generation of the configuration file, we provide a graphical user interface (GUI) for interactive figure configuration (Supplementary Fig. S1).

Multi-region plotting
Figeno can plot one or multiple regions simultaneously.An important feature is that, although some tracks will be plotted independently for each region, other tracks can show interactions across regions: HiC tracks show chromatin interactions across genomic regions (Fig. 1C), structural variant (SV) tracks can show breakpoints across regions (Fig. 1E), alignment tracks can link multiple alignments from split reads aligned to different regions (Supplementary Fig. S2), and scales in bigwig and coverage tracks can automatically be adjusted across regions.

Support for long-read data
Figeno implements features unique to long-read sequencing data, including grouping alignments by haplotype [if the HP tag is set in the bam file, e.g. by WhatsHap (Martin et al. 2016)] and coloring by base modification (if the MM and ML tags are provided in the bam file).In addition to the most common base modification (5mC), any base modification (e.g.5hmC, 6 mA) can be visualized.Furthermore, two base modifications can be visualized at the same time (e.g.5mC and 5hmC).We also support a "basemod_freq" track to specify the base modification frequency at each position in a haplotypeaware manner.The alignments and "basemod_freq" tracks can display allele-specific DNA methylation.The leukemic cell line GDM-1 harbors a translocation between chromosomes six and seven, which activates MNX1 by enhancer hijacking (Weichenhan et al. 2023).By visualizing allele-specific methylation in this region using figeno, we observed that the wild type allele is methylated at the MNX1 promoter while the rearranged allele is hypomethylated (Fig. 1D).Finally, reads aligning to different genomic regions ("split reads") can be visualized by a line connecting all alignments from the same read (Supplementary Fig. S2).

Customizable figure layouts
The default layout is "horizontal" and results in the regions being arranged horizontally from left to right.This layout is best-suited for most applications, but we also offer several other layouts that can be particularly useful for plotting whole genome sequencing data.First, the "circular" layout can be used to display all regions on a circle, e.g. to show all chromosomes in a circos plot (Fig. 1E).For settings where we only want to look at 2-8 chromosomes with breakpoints between them, we also provide a novel "symmetrical" layout, where the regions are aligned on two rows, but the order of the tracks in the top row is reversed, which is particularly useful when an SV track is displayed between the two rows (Supplementary Fig. S3).

Implementation and availailability
Figeno is implemented in python and uses matplotlib (Hunter 2007) for plotting.It relies on pysam (Li et al. 2009) for reading bam files, pybigwig for reading bigwig files, cooler (Abdennur and Mirny 2020) for reading HiC data, and vcfpy for reading vcf files.The GUI was created using the javascript framework React and utilizes a local Flask-based webserver.The time required to generate a figure depends on the number and types of tracks, on the size of the genomic regions being visualized and on the computer being used, but figeno generally only takes a couple of seconds to generate a figure.The code is completely open-source and is available on Github (http://github.com/CompEpigen/figeno)under the GPL-3 license; the present manuscript describes figeno version 1.2.0.

Conclusion
Taken together, figeno is a rich and user-friendly visualization tool for genomics and epigenomics data, especially for structural rearrangements and long-read data.It supports an extensive collection of input formats (Supplementary Table S1) and provides a GUI for enhanced usability by a broad range of users, from experienced bioinformaticians to beginners.To support them, we also provide extensive and detailed documentation on ReadTheDocs (https://figeno.readthedocs.io/).

Figure 1 .
Figure 1.Figeno overview and example outputs.(A) Overview of figeno.(B) List of all 10 track types available within figeno.(C) HiC data around a breakpoint in the LNCaP cell line (Wang et al. 2021), and DNase-seq bigwig track from the LNCaP cell line, downloaded from the ENCODE project (ENCSR000EKT).(D) Allele-specific DNA methylation at the MNX1 locus in the GDM-1 cell line, based on nanopore sequencing data.For the alignments track, red indicates a methylated CpG site and blue unmethylated.(E) Circular plot showing copy numbers and structural variants for the THP-1 cell line [data from the cancer cell line encyclopedia (Ghandi et al. 2019)].Copy-number gains are indicated in red and losses in blue.